Where can I find the BAM file where reads are associated with bead barcodes and/or molecular barcodes?

Where can I find the BAM file where reads are associated with bead barcodes and/or molecular barcodes?

  • The Seeker pipeline does not generate this BAM file, where reads are associated with bead or molecular barcodes, by default. However, the association can be generated by merging two of the pipeline’s intermediate outputs (r1-db and r2-db). The output will be a parquet file where each row is a read - read ID and bead barcode sequences are defined for each row. 
  • Merging of r1-db and r2-db involves running a module named gen-merged-r1-r2-db in the Seeker singularity container, following steps below: 
  • Find the two intermediate files in the work folder for the step  

 

PRIMARY_JOINED:CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB  

 

in 

 

${root_output_dir}/work/${step_hash} 

 

The beginning of the  ${step_hash}of this folder can be identified in this file: 

${root_output_dir}/results/pipeline_info/execution_trace_${date}_${time}.txt 

Below is an example of the beginning of  ${step_hash} (yellow box) for  

PRIMARY_JOINED:CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB (red box) in fileexecution_trace_${date}_${time}.txt 

 

A screen shot of a computer screen

AI-generated content may be incorrect. 
 
Once you have identified the value boxed in yellow, look for this folder: 

${root_output_dir}/work/${step_hash} 

Note: The value in yellow is the beginning of the step_hash; use [tab] to find the full path.   

Copy your samplesheet.csv to this folder.  

 

Make sure that the ${Sample_ID}-r1-db, ${Sample_ID}-r2-db, and the samplesheet.csvfilesfor your sample of interest can be located in this folder. 

 

Run the command below (in the same folder): 

singularity exec ${path_to_curioseekerv2_singularity_container} \ curio-seeker-pipeline \  
gen-merged-r1-r2-db \  
--samplesheet="${path_to_samplesheet}" \ 
--sample=${sample_id} 

  • ${path_to_curioseekerv2_singularity_container}: You can find this path in the nextflow.config file (curioseeker-2.0.0/nextflow.config), as defined by the parameter curio_seeker_singularity. 
  • ${path_to_samplesheet}: path to the samplesheet.csv you used to process this sample 
  • ${sample_id}: Sample_ID : used for processing this sample 

 

Example Command:  

singularity exec /home/.singularity/curio-seeker-singularity:v2.0.0.sif \  
curio-seeker-pipeline \ 
gen-merged-r1-r2-db \ --samplesheet=/mnt/seeker/work/99/387e66e5dd67a13f/samplesheet.20240115_Mouse_spleen.csv \  
--sample=Mouse_spleen 

After a successful run, a folder named ${Sample_ID}-r1-r2-merged will be created in the same work folder containing chunked parquet files where each row is a read 

  • Read ID is defined in column read1_id 
  • Bead barcode is defined in column BM  

 

Additionally, only rows with column r1_proper_structure_matched == True, column XS ==  Assigned should be included. 

 

Troubleshooting: 

 

  • If the above command gives this error:  

 

Invalid value for '--samplesheet': Path' samplesheet.csv does not exist 

 

Include --bind flag shown below to fix the issue. 

singularity exec \ 
--bind ${root_samplesheet_folder} 
${path_to_curioseekerv2_singularity_container} \ 
curio-seeker-pipeline \  
gen-merged-r1-r2-db \  
--samplesheet="${path_to_samplesheet}" \ 
--sample=${sample_id} 

Example Command:  

singularity exec --bind /mnt/ /home/.singularity/curio-seeker-singularity:v2.0.0.sif \  
curio-seeker-pipeline \ 
gen-merged-r1-r2-db \ --samplesheet=/mnt/seeker/work/99/387e66e5dd67a13f/samplesheet.20240115_Mouse_spleen.csv \  
--sample=Mouse_spleen 

Here, the --bind flag allows mounting of a directory  (/mnt/) from the host machine into the container, enabling access to the content of the directory by the container.  

related articles